Chronic kidney Disease is a major global health challenge due to its silent progression and lack of early symptoms, often leading late- stage diagnosis. This paper present an enhanced machine learning framework using ensemble learning techniques for the early prediction of chronic kidney disease. The proposed model integrate traditional clinical parameter with additional lifestyle related risk factors such as smoking habits, dietary patterns, water in-taking, family history, and physical activity levels. It also incorporate chronic kidney disease stage identification to provide more detailed diagnosis insights. In the existing system has found some challenges such as missing values, imbalanced data, irrelevant features, noise, bias, and overfiffing , To overcome this issues the proposed framework applies robust data preprocessing and feature selection methods, ensuring that key health indicators are effectively utilized. The hybrid ensemble -based approach aims to improve prediction accuracy, precision, f1_score, reliability, and generalizability compared to single model systems. Experimental result on benchmark CKD dataset demonstrate the effectiveness of the proposed system predicting at early stages, while also highlighting the significant influence of lifestyle factors on disease risk prediction with proposed framework adaboost has to validate the average accuracy 98.9% for the early identification of Chronic kidney disease and early diagnosis support of healthcare system.
Introduction
Chronic Kidney Disease (CKD) is a progressive condition that often remains undiagnosed in early stages due to the absence of symptoms. Early detection is crucial to improve patient outcomes and reduce the burden on healthcare systems.
???? Role of Machine Learning
Machine learning (ML) offers promising solutions for early CKD detection. Traditional diagnosis methods are manual and subjective, whereas ML can:
Analyze large datasets,
Identify patterns,
Make faster and more accurate predictions.
However, existing ML models for CKD suffer from:
Class imbalance,
Missing data,
Overfitting,
Lack of real-world features (e.g., lifestyle data).
???? Proposed Methodology
A hybrid ensemble learning approach is proposed to enhance CKD prediction accuracy. The key components include:
? 1. Dataset
Based on the UCI CKD dataset
Includes clinical features (e.g., blood pressure, haemoglobin) and lifestyle factors (e.g., smoking, water intake, exercise).
? 2. Data Preprocessing
Missing values handled via mean/mode imputation.
Categorical data encoded.
Features normalized and irrelevant data removed.
? 3. Class Imbalance Handling
SMOTE (Synthetic Minority Over-sampling Technique) used to balance CKD and non-CKD samples.
? 4. Feature Selection
Techniques like correlation analysis and recursive feature elimination are used to select relevant attributes, reducing overfitting and improving interpretability.
Ensemble Models: Gradient Boosting, XGBoost, and AdaBoost.
Ensemble models used optimized hyperparameters for better performance.
? 6. Model Evaluation
Performance measured using:
Accuracy
Precision
Recall
F1-Score
AUC-ROC
???? Results & Findings
AdaBoost achieved the best performance:
Accuracy: 98.9%
Precision, Recall, F1-score: 0.99
XGBoost closely followed with 98.4% accuracy.
Baseline models performed decently but were affected by class imbalance and lacked robustness.
Inclusion of lifestyle factors significantly improved model accuracy.
Confusion matrices and classification reports confirm the superior performance of ensemble models.
???? Performance Comparison Table
Model
Accuracy
Precision
Recall
F1-Score
KNN
0.894
0.834
0.80
0.88
Decision Tree
0.900
0.95
0.91
0.93
Naive Bayes
0.860
0.87
0.89
0.88
Random Forest
0.920
0.96
0.94
0.95
AdaBoost
0.989
0.99
0.99
0.99
XGBoost
0.984
0.98
0.98
0.98
Conclusion
This study presents a novel and research-based hybrid ensemble approach for the early prediction of Chronic Kidney, Chronic Kidney Disease (CKD) remains a major health challenge due to its silent progression and lack of early symptoms. In this study, a hybrid ensemble machine learning framework was proposed to enhance the early prediction of CKD. The model incorporated both clinical and lifestyle features, with data preprocessing, feature selection, and class balancing using SMOTE to improve prediction reliability.Traditional machine learning models such as KNN, Decision Tree, and Naive Bayes were initially implemented, but showed limited performance due to data imbalance and lack of robust learning. The proposed ensemble classifiers—AdaBoost, XGBoost, and Gradient Boosting demonstrated significant improvements, with AdaBoost achieving the highest accuracy of 98.9%, followed by XGBoost with 98.4%. These models also yielded better precision, recall, and F1-scores, particularly in predicting minority class samples.
Overall, the proposed system demonstrates the results validate that integrating ensemble methods with proper data handling significantly enhances CKD prediction accuracy. The proposed framework is scalable, interpretable, and suitable for clinical decision support systems. The proposed hybrid ensemble framework demonstrates strong potential for early CKD prediction, but further enhancements are possible.
Future work may focus on integrating real-time Electronic Health Records (EHRs) and wearable devices in health data to improve personalization. Incorporating Explainable AI (XAI) techniques like SHAP or LIME can increase model transparency. Additionally, extending the system to support multi-disease prediction and deploying it as a mobile or web-based clinical tool could broaden its practical impact. Finally, enabling continuous learning with patient feedback will ensure adaptability and long-term model effectiveness.
References
[1] J. K. Singh, A. Kumar, and M. Sharma, “Prediction of chronic kidney disease using machine learning algorithms,” Int. J. Eng. Res. Technol. (IJERT), vol. 9, no. 6, pp. 112–116, 2020.
[2] R. Sharma and M. Bansal, “Comparative study of classification algorithms for chronic kidney disease prediction,” Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol., vol. 6, no. 2, pp. 45–50, 2021.
[3] A. Patel, S. Gupta, and P. Kumar, “Chronic kidney disease stage classification using neural networks and KNN,” Procedia Comput. Sci., vol. 167, pp. 1901–1910, 2019.
[4] S. Verma and P. Gupta, “Ensemble learning-based predictive model for chronic disease classification,” Mater. Today: Proc., vol. 51, pp. 2152–2156, 2022.
[5] T. Ahmed, M. Hossain, and F. Rahman, “Deep learning-based framework for early detection of CKD,” Comput. Biol. Med., vol. 142, p. 105208, 2023.
[6] K. Mehta and R. Singh, “Decision tree approach for predicting kidney disease,” Int. J. Sci. Res. Publ., vol. 11, no. 3, pp. 210–214, 2021.
[7] B. Yadav, R. Sahu, and A. Singh, “Machine learning techniques for early stage CKD diagnosis,” Int. J. Comput. Appl., vol. 176, no. 5, pp. 30–34, 2020.
[8] N. Das and M. Roy, “Feature selection and CKD classification using PCA and Random Forest,” J. King Saud Univ. - Comput. Inf. Sci., 2022. [Online]. Available: https://doi.org/10.1016/j.jksuci.2022.02.013
[9] M. Khan, S. Ahmed, and H. Rehman, “A hybrid naive Bayes and decision tree model for medical diagnosis,” J. Ambient Intell. Humaniz. Comput., vol. 12, pp. 10271–10281, 2021.
[10] D. Prasad and V. Rani, “LightGBM-based prediction system for chronic diseases with hyperparameter tuning,” J. Big Data, vol. 10, no. 1, p. 88, 2023.
[11] S. Alam, A. Ali, and F. Akhtar, “Performance evaluation of ensemble models for CKD classification,” Int. J. Health Sci., vol. 5, no. 3, pp. 122–129, 2021.
[12] P. Rao, K. Singh, and A. Mishra, “Fuzzy logic-based approach for CKD prediction,” Int. J. Med. Inform., vol. 141, p. 104218, 2020.
[13] A. Shaikh and D. Jagtap, “XGBoost classifier for prediction of kidney disease,” Int. J. Innov. Res. Comput. Commun. Eng., vol. 9, no. 4, pp. 812–818, 2021.
[14] R. Naidu, A. Reddy, and S. Kumar, “Rule-based decision system for CKD detection,” Procedia Comput. Sci., vol. 198, pp. 275–280, 2023.
[15] L. Srinivas and K. Nandini, “Comparative analysis of CKD prediction using AdaBoost, SVM, and neural networks,” J. Emerg. Technol. Innov. Res., vol. 9, no. 6, pp. 102–109, 2022.
[16] S. Roy and A. Das, “A comparative analysis of ensemble techniques for early prediction of chronic kidney disease,” Int. J. Healthc. Inf. Syst. Inform., vol. 17, no. 1, pp. 1–14, 2023.
[17] T. Reddy and M. Dasari, “Hybrid machine learning model using SVM and decision tree for CKD classification,” J. Med. Syst., vol. 46, no. 5, p. 67, 2022.
[18] K. Latha and S. Ramesh, “Fuzzy logic-based intelligent system for chronic kidney disease detection,” Comput. Methods Programs Biomed., vol. 219, p. 106798, 2022.
[19] P. Bhattacharya, A. Jain, and R. Singh, “IoMT-based diagnostic framework for real-time CKD prediction,” IEEE Access, vol. 11, pp. 34560–34570, 2023.
[20] M. Srivastava, D. Kumar, and R. Kaushik, “Interpretable machine learning model using SHAP for CKD prediction,” Appl. Soft Comput., vol. 138, p. 110256, 2024.